Qualcomm® AI Hub
Qualcomm AI Hub contains a large collection of pretrained AI models that are optimized to run on Dragonwing hardware on the NPU.
Finding supported models
Models in AI Hub are categorized by supported Qualcomm chipset. To see models that will run on your development kit:
1️⃣ Go to the model list.
2️⃣ Under 'Chipset', select:
- RB3 Gen 2 Vision Kit: 'Qualcomm QCS6490 (Proxy)'
- RUBIK Pi 3: 'Qualcomm QCS6490 (Proxy)'
Deploying a model to NPU (Python)
As an example, let's deploy the Lightweight-Face-Detection model.
Running the example repository
All AI Hub models come with an example repository. This is a good starting point, as it shows exactly how to run the model. It shows what the input to your network should look like, and how to interpret the output (here, to map the output tensor to bounding boxes). The example repositories do NOT run on the NPU or GPU yet - they run without acceleration. Let's see what our input/output should look like before we move this model to the NPU.
On the AI Hub page for Lightweight-Face-Detection, click "Model repository". This links you to a README file with instructions on how to run the example repository.
To deploy this model, open the terminal on your development board, or an ssh session to your development board, and:
1️⃣ Create a new venv and install some base packages:
mkdir -p ~/aihub-demo
cd ~/aihub-demo
python3 -m venv .venv
source .venv/bin/activate
pip3 install numpy setuptools Cython shapely
2️⃣ Download an image with a face (640x480 resolution, JPG format) onto your development board, e.g. via:
wget https://cdn.edgeimpulse.com/qc-ai-docs/example-images/three-people-640-480.jpg

Input resolution: AI Hub models require correctly sized inputs. You can find the required resolution under "Technical Details > Input resolution" (in HEIGHT x WIDTH (here 480x640 => 640x480 for wxh)); or inspect the size of the input tensor on the TFLite or ONNX file.
3️⃣ Follow the instructions under 'Example & Usage' for the Facial Landmark Detection model:
# Install the example (add --no-build-isolation)
pip3 install --no-build-isolation "qai-hub-models[face-det-lite]"
# Run the example
#    Use --help to see all options
python3 -m qai_hub_models.models.face_det_lite.demo --quantize w8a8 --image ./three-people-640-480.jpg --output-dir out/
You can find the output image in out/FaceDetLitebNet_output.png.
If you're connected over ssh, you can copy the output image back to your host computer via:
# Find IP via: ifconfig | grep -Eo 'inet (addr:)?([0-9]*\.){3}[0-9]*' | grep -Eo '([0-9]*\.){3}[0-9]*' | grep -v '127.0.0.1'
# Then: (replace 192.168.1.148 by the IP address of your development kit)
scp ubuntu@192.168.1.148:~/aihub-demo/out/FaceDetLitebNet_output.png ~/Downloads/FaceDetLitebNet_output.png

4️⃣ Alright! We have a working model. For reference, on the RUBIK Pi 3, running this model takes 189.7ms per inference.
Porting the model to NPU
Now that we have a working reference model, let's run it on the NPU. There are three parts that you need to implement.
1️⃣ You need to preprocess the data, e.g. convert the image into features that you can pass to the neural network.
2️⃣ You need to export the model to ONNX or TFLite, and run the model through LiteRT or ONNX Runtime.
3️⃣ You need to postprocess the output, e.g. convert the output of the neural network to bounding boxes of faces.
The model is straight forward, as you can read in the LiteRT and ONNX Runtime pages. However, the pre- and post-processing code might not be...
Preprocessing inputs
For image models most AI Hub models take a matrix of (HEIGHT, WIDTH, CHANNELS) (LiteRT) or (CHANNELS, HEIGHT, WIDTH) (ONNX) scaled from 0..1. If you have 1 channel, convert the image to grayscale first. If your model is quantized (most likely) you'll also need to read zero_point and scale, and scale the pixels accordingly (this is easy in LiteRT as they contain the quantization parameters, but ONNX does not have these). Typically you'll end up with data scaled linearly 0..255 (uint8) or -128..127 (int8) for quantized models - so that's relatively easy. A function that demonstrates all this in Python can be found below in the example code (def load_image_litert).
HOWEVER... This is not guaranteed; and this is where the AI Hub example code comes in. Every AI Hub example contains the exact code used to scale inputs. In our current example - Lightweight-Face-Detection - the input is shaped (480, 640, 1). However, if you look at the preprocessing code the data is not converted to grayscale, but instead only the blue channel of an RGB image is taken:
img_array = img_array.astype("float32") / 255.0
img_array = img_array[np.newaxis, ...]
img_tensor = torch.Tensor(img_array)
img_tensor = img_tensor[:, :, :, -1]        # HERE WE TAKE BLUE CHANNEL, NOT CONVERT TO GRAYSCALE
These kind of things are very easy to get wrong. So if you see non-matching results between your implementation and the AI Hub example: read the code. This applies even more for non-image inputs (e.g. audio). Use the demo code to understand what the model actually expects.
Postprocessing outputs
The same applies to postprocessing. For example, there's no standard way of mapping the output of a neural network to bounding boxes (to detect faces). For Lightweight-Face-Detection you can find the code here: face_det_lite/app.py#L77.
If you're targeting Python, it's often easiest to copy the postprocessing code into your application; as AI Hub has a lot of dependencies that you might not want. In addition the postprocessing code operates on PyTorch tensors, and your inference runs under LiteRT or ONNX Runtime; thus, you'll need to change some small aspects. We'll show this just below in the end-to-end example.